A Survey of Optimistic Planning in Markov Decision Processes
نویسندگان
چکیده
We review a class of online planning algorithms for deterministic and stochastic optimal control problems, modeled as Markov decision processes. At each discrete time step, these algorithmsmaximize the predicted value of planning policies from the current state, and apply the first action of the best policy found. An overall recedinghorizon algorithm results, which can also be seen as a type ofmodel-predictive control. The space of planning policies is explored optimistically, focusing on areas with largest upper bounds on the value – or upper confidence bounds, in the stochastic case. The resulting optimistic planning framework integrates several types of optimism previously used in planning, optimization, and reinforcement learning, in order to obtain several intuitive algorithms with good performance guarantees. We describe in detail three recent such algorithms, outline the theoretical guarantees on their performance, and illustrate their behavior in a numerical example. Work performed in part while L. Buşoniu was with Team SequeL, INRIA Lille. He is also associated with the Automation Department, Technical University of Cluj-Napoca, Romania. A Survey of Optimistic Planning in Markov Decision Processes. By Buşoniu, Munos, and Babuška Copyright c © 2012 John Wiley & Sons, Inc. 1 2 A SURVEY OF OPTIMISTIC PLANNING IN MARKOV DECISION PROCESSES
منابع مشابه
Optimistic Planning in Markov Decision Processes
We review a class of online planning algorithms for deterministic and stochastic optimal control problems, modeled as Markov decision processes. At each discrete time step, these algorithms maximize the predicted value of planning policies from the current state, and apply the first action of the best policy found. An overall recedinghorizon algorithm results, which can also be seen as a type o...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملAggregating Optimistic Planning Trees for Solving Markov Decision Processes
This paper addresses the problem of online planning in Markov decision processes using a randomized simulator, under a budget constraint. We propose a new algorithm which is based on the construction of a forest of planning trees, where each tree corresponds to a random realization of the stochastic environment. The trees are constructed using a “safe” optimistic planning strategy combining the...
متن کاملOptimistic planning for Markov decision processes
The reinforcement learning community has recently intensified its interest in online planning methods, due to their relative independence on the state space size. However, tight near-optimality guarantees are not yet available for the general case of stochastic Markov decision processes and closed-loop, state-dependent planning policies. We therefore consider an algorithm related to AO* that op...
متن کاملSample-Based Planning for Continuous Action Markov Decision Processes
In this paper, we present a new algorithm that integrates recent advances in solving continuous bandit problems with sample-based rollout methods for planning in Markov Decision Processes (MDPs). Our algorithm, Hierarchical Optimistic Optimization applied to Trees (HOOT) addresses planning in continuous action MDPs, directing the exploration of the search tree using insights from recent bandit ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012